13 research outputs found

    BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains

    Get PDF
    The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed

    A genotype imputation method for de-identified haplotype reference information by using recurrent neural network.

    No full text
    Genotype imputation estimates the genotypes of unobserved variants using the genotype data of other observed variants based on a collection of haplotypes for thousands of individuals, which is known as a haplotype reference panel. In general, more accurate imputation results were obtained using a larger size of haplotype reference panel. Most of the existing genotype imputation methods explicitly require the haplotype reference panel in precise form, but the accessibility of haplotype data is often limited, due to the requirement of agreements from the donors. Since de-identified information such as summary statistics or model parameters can be used publicly, imputation methods using de-identified haplotype reference information might be useful to enhance the quality of imputation results under the condition where the access of the haplotype data is limited. In this study, we proposed a novel imputation method that handles the reference panel as its model parameters by using bidirectional recurrent neural network (RNN). The model parameters are presented in the form of de-identified information from which the restoration of the genotype data at the individual-level is almost impossible. We demonstrated that the proposed method provides comparable imputation accuracy when compared with the existing imputation methods using haplotype datasets from the 1000 Genomes Project (1KGP) and the Haplotype Reference Consortium. We also considered a scenario where a subset of haplotypes is made available only in de-identified form for the haplotype reference panel. In the evaluation using the 1KGP dataset under the scenario, the imputation accuracy of the proposed method is much higher than that of the existing imputation methods. We therefore conclude that our RNN-based method is quite promising to further promote the data-sharing of sensitive genome data under the recent movement for the protection of individuals' privacy

    Comparison of Kit-Based Metabolomics with Other Methodologies in a Large Cohort, towards Establishing Reference Values

    No full text
    Metabolic profiling is an omics approach that can be used to observe phenotypic changes, making it particularly attractive for biomarker discovery. Although several candidate metabolites biomarkers for disease expression have been identified in recent clinical studies, the reference values of healthy subjects have not been established. In particular, the accuracy of concentrations measured by mass spectrometry (MS) is unclear. Therefore, comprehensive metabolic profiling in large-scale cohorts by MS to create a database with reference ranges is essential for evaluating the quality of the discovered biomarkers. In this study, we tested 8700 plasma samples by commercial kit-based metabolomics and separated them into two groups of 6159 and 2541 analyses based on the different ultra-high-performance tandem mass spectrometry (UHPLC-MS/MS) systems. We evaluated the quality of the quantified values of the detected metabolites from the reference materials in the group of 2541 compared with the quantified values from other platforms, such as nuclear magnetic resonance (NMR), supercritical fluid chromatography tandem mass spectrometry (SFC-MS/MS) and UHPLC-Fourier transform mass spectrometry (FTMS). The values of the amino acids were highly correlated with the NMR results, and lipid species such as phosphatidylcholines and ceramides showed good correlation, while the values of triglycerides and cholesterol esters correlated less to the lipidomics analyses performed using SFC-MS/MS and UHPLC-FTMS. The evaluation of the quantified values by MS-based techniques is essential for metabolic profiling in a large-scale cohort

    Novel candidates of pathogenic variants of the BRCA1 and BRCA2 genes from a dataset of 3,552 Japanese whole genomes (3.5KJPNv2).

    No full text
    Identification of the population frequencies of definitely pathogenic germline variants in two major hereditary breast and ovarian cancer syndrome (HBOC) genes, BRCA1/2, is essential to estimate the number of HBOC patients. In addition, the identification of moderately penetrant HBOC gene variants that contribute to increasing the risk of breast and ovarian cancers in a population is critical to establish personalized health care. A prospective cohort subjected to genome analysis can provide both sets of information. Computational scoring and prospective cohort studies may help to identify such likely pathogenic variants in the general population. We annotated the variants in the BRCA1 and BRCA2 genes from a dataset of 3,552 whole-genome sequences obtained from members of a prospective cohorts with genome data in the Tohoku Medical Megabank Project (TMM) with InterVar software. Computational impact scores (CADD_phred and Eigen_raw) and minor allele frequencies (MAFs) of pathogenic (P) and likely pathogenic (LP) variants in ClinVar were used for filtration criteria. Familial predispositions to cancers among the 35,000 TMM genome cohort participants were analyzed to verify the identified pathogenicity. Seven potentially pathogenic variants were newly identified. The sisters of carriers of these moderately deleterious variants and definite P and LP variants among members of the TMM prospective cohort showed a statistically significant preponderance for cancer onset, from the self-reported cancer history. Filtering by computational scoring and MAF is useful to identify potentially pathogenic variants in BRCA genes in the Japanese population. These results should help to follow up the carriers of variants of uncertain significance in the HBOC genes in the longitudinal prospective cohort study

    Regional genetic differences among Japanese populations and performance of genotype imputation using whole-genome reference panel of the Tohoku Medical Megabank Project

    No full text
    Abstract Background Genotype imputation from single-nucleotide polymorphism (SNP) genotype data using a haplotype reference panel consisting of thousands of unrelated individuals from populations of interest can help to identify strongly associated variants in genome-wide association studies. The Tohoku Medical Megabank (TMM) project was established to support the development of precision medicine, together with the whole-genome sequencing of 1070 human genomes from individuals in the Miyagi region (Northeast Japan) and the construction of the 1070 Japanese genome reference panel (1KJPN). Here, we investigated the performance of 1KJPN for genotype imputation of Japanese samples not included in the TMM project and compared it with other population reference panels. Results We found that the 1KJPN population was more similar to other Japanese populations, Nagahama (south-central Japan) and Aki (Shikoku Island), than to East Asian populations in the 1000 Genomes Project other than JPT, suggesting that the large-scale collection (more than 1000) of Japanese genomes from the Miyagi region covered many of the genetic variations of Japanese in mainland Japan. Moreover, 1KJPN outperformed the phase 3 reference panel of the 1000 Genomes Project (1KGPp3) for Japanese samples, and IKJPN showed similar imputation rates for the TMM and other Japanese samples for SNPs with minor allele frequencies (MAFs) higher than 1%. Conclusions 1KJPN covered most of the variants found in the samples from areas of the Japanese mainland outside the Miyagi region, implying 1KJPN is representative of the Japanese population’s genomes. 1KJPN and successive reference panels are useful genome reference panels for the mainland Japanese population. Importantly, the addition of whole genome sequences not included in the 1KJPN panel improved imputation efficiencies for SNPs with MAFs under 1% for samples from most regions of the Japanese archipelago
    corecore